NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ADO: Automatic Data Optimization for Inputs in LLM Prompts

https://doi.org/10.18653/v1/2025.findings-acl.1340

Lin, Sam; Hua, Wenyue; Li, Lingyao; Wang, Zhenting; Zhang, Yongfeng (October 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available October 3, 2026
Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

https://doi.org/10.18653/v1/2025.findings-acl.983

Hua, Wenyue; Zhu, Kaijie; Li, Lingyao; Fan, Lizhou; Jin, Mingyu; Lin, Shuhang; Xue, Haochen; Li, Zelong; Wang, Jindong; Zhang, Yongfeng (October 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available October 3, 2026
“HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social Media

https://doi.org/10.1145/3643829

Li, Lingyao; Fan, Lizhou; Atreja, Shubham; Hemphill, Libby (May 2024, ACM Transactions on the Web)

Harmful textual content is pervasive on social media, poisoning online communities and negatively impacting participation. A common approach to this issue is developing detection models that rely on human annotations. However, the tasks required to build such models expose annotators to harmful and offensive content and may require significant time and cost to complete. Generative AI models have the potential to understand and detect harmful textual content. We used ChatGPT to investigate this potential and compared its performance with MTurker annotations for three frequently discussed concepts related to harmful textual content on social media: Hateful, Offensive, and Toxic (HOT). We designed five prompts to interact with ChatGPT and conducted four experiments eliciting HOT classifications. Our results show that ChatGPT can achieve an accuracy of approximately 80% when compared to MTurker annotations. Specifically, the model displays a more consistent classification for non-HOT comments than HOT comments compared to human annotations. Our findings also suggest that ChatGPT classifications align with the provided HOT definitions. However, ChatGPT classifies “hateful” and “offensive” as subsets of “toxic.” Moreover, the choice of prompts used to interact with ChatGPT impacts its performance. Based on these insights, our study provides several meaningful implications for employing ChatGPT to detect HOT content, particularly regarding the reliability and consistency of its performance, its understanding and reasoning of the HOT concept, and the impact of prompts on its performance. Overall, our study provides guidance on the potential of using generative AI models for moderating large volumes of user-generated textual content on social media.
more » « less
Full Text Available
DataChat: Prototyping a Conversational Agent for Dataset Search and Visualization

https://doi.org/10.1002/pra2.820

Fan, Lizhou; Lafia, Sara; Li, Lingyao; Yang, Fangyuan; Hemphill, Libby (October 2023, Proceedings of the Association for Information Science and Technology)

Data users need relevant context and research expertise to effectively search for and identify relevant datasets. Leading data providers, such as the Inter‐university Consortium for Political and Social Research (ICPSR), offer standardized metadata and search tools to support data search. Metadata standards emphasize the machine‐readability of data and its documentation. There are opportunities to enhance dataset search by improving users' ability to learn about, and make sense of, information about data. Prior research has shown that context and expertise are two main barriers users face in effectively searching for, evaluating, and deciding whether to reuse data. In this paper, we propose a novel chatbot‐based search system, DataChat, that leverages a graph database and a large language model to provide novel ways for users to interact with and search for research data. DataChat complements data archives' and institutional repositories' ongoing efforts to curate, preserve, and share research data for reuse by making it easier for users to explore and learn about available research data.
more » « less
Full Text Available

Search for: All records